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Abstract 

An 8-Gb/s 0.3-|im CMOS transceiver uses multilevel 
signaling (4-PAM) and transmit pre-shaping in combination 
with receive equalization to reduce ISI due to channel low- 
pass effects. High on-chip frequencies are avoided by multi- 
plexing and demultiplexing the data directly at the pads. 
Timing recovery takes advantage of a novel frequency acqui- 
sition scheme and a linear PLL with a loop bandwidth 
>30MHz, phase margin >48° and capture range of 20MHz 
without a frequency acquisition aid. The transmitted 8-Gbps 
data is successfully detected by the receiver after a 10-m 
coaxial cable. The 2mm x 2mm chip consumes 1.1W at 
8Gbps with a 3-V supply. 

Introduction 

As the demand for higher data-rate communication 
increases, low-cost, high-speed serial links using copper cables 
become more attractive for distances of 1 to 10 meters[ 1].[2]. 
For multi-Gbps applications, the data rate is limited by the 
cable skin-effect loss and the process technology. The 10-m 
coaxial cable (PE-142LL) used in this work has a ~3dB 
bandwidth of 1.2GHz. This design differs from existing Gbps 
links [1],[2] in its use of a receiver equalizer in combination 
with a transmitter filter to compensate for the cable 
characteristics. High on-chip frequencies are avoided by 
multiplexing and demultiplexing the data directly at the pads. 
To reduce the symbol rate, a 4-level pulse amplitude 
modulation (4-PAM) is used. A new proportional phase 
detector for data recovery is proposed which does not suffer 
from the stability and bandwidth limitations of traditional 
bang-bang loops. A novel frequency acquisition architecture is 
designed to enable the receive PLL to lock to the input stream 
under all process variations. 

System Architecture 

Implementing optimal detection methods for multi-Gbps 
rates demands high complexity and large area [3]. Instead, 
square pulses, which can be generated and detected with 
modest complexity, are used as the basis communication 
symbols [4]. At rates well above the channel bandwidth, 
however, square pulses result in severe intersymbol 
interference (ISI) which reduces the data-eye openings. For a 
given data rate, the 4-PAM scheme reduces the symbol-rate to 
half compared to a conventional 2-PAM system. This symbol 
rate reduction lowers not only the ISI in the channel, but also 
the maximum required on-chip clock frequency. 

To invert the channel, a 1 -tap equalizer at the receiver and 
a 2-tap pre-emphasis filter at the transmitter are used. The 
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Fig. I Pulse shape with and without filtering 

effects of these filters for a 0.2-ns pulse (5Gsym/s) at the near 
and far end of the 10-m channel are shown in Fig. 1 . The unfil- 
tered pulse response remains at a large value 0.2ns after its 
peak (next symbol sample point), while the pre-shaped equal- 
ized signal is zero at that point. The filter tap weights can be 
programmed for different channels. 

The on-chip frequency requirement is further reduced to 
1/5 the symbol rate (1/10 bit rate) by performing 5:1 multi- 
plexing and 1:5 demultiplexing directly at the chip pads, 
allowing 5 symbols to be transmitted every clock cycle. The 5 
symbols correspond to 10 bits that include 4 data symbols and 
1 symbol for line coding. In this design, coding is performed 
on chip to provide enough transitions for clock recovery. 

Circuit Implementation 

The block diagram of the complete transceiver chip is 
depicted in Fig. 2. The transmitter, comprising 5 identical 
drivers, uses different clock phases from a 5-stage differential 
ring oscillator (TX-VCO) to multiplex the data stream onto the 
50-C1 line. The detailed transmitter design is described in [5]. 

The receiver performs 1 :5 demultiplexing by sampling the 
signal with 5 out of 10 clock phases from a 5-stage differential 
ring oscillator (RX-VCO). The 5 additional alternate clock 
phases provide required samples for the input equalizer with 
half-symbol-spaced tap spacings, and allow 2x oversampling 
to recover timing. Samples are next filtered by the 1-tap input 
equalizer which differentially subtracts the weighted value of 
the former sample from the present sample. This operation is 
done by current summing two differential values with opposite 
polarity (Fig. 3). The equalizer further improves the data eye 
area (height X width) up to 40% by sharpening the signal 
transitions. The ADC samplers are similar to those used in [7]. 
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Fig. 2 Transceiver general architecture 

When the receive PLL is locked properly to the input data, 
half of the 10 analog equalizer outputs represent sampled 
values at the center of the symbols (S c ), and half are samples at 
the data transitions (S e ). The S c samples are digitized by 2-bit 
flash ADCs and result in the received data bits that are next 
resynchronized to a global clock. The S e samples are buffered 
by linear amplifiers and used as part of a linear phase detector 
for timing recovery. The samples are pipelined properly to 
match the delays of the different signal paths corresponding to 
S c and S e . 

Timing recovery uses data transitions to detect the phase 
between the data and the sampling clocks. In a differential 4- 
PAM stream, there are 3 distinct transition types (Fig. 4). Of 
these 3 types, only type! makes a transition to the same 
magnitude but opposite polarity, which results in a zero 
crossing that occurs exactly at the mid-point between two 
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symbols and which therefore can be used for clock recovery. 
The two other types are ignored as they convey wrong phase 
information. In every cycle (5 symbols), one type! transition is 
guaranteed by the transmitter's 4/5sym encoder. 

Traditional linear data PLLs offer good loop stability and 
bandwidth, but most suffer from a systematic phase offset. 
Sampling transitions by the same mechanism as the data 
reduces the systematic phase offset in data recovery. However, 
conventional sampling digital loops use bang-bang control, 
resulting in limited bandwidth and stability [6]. Fig. 5a shows 
the block diagram of the data Phase/Freq. detector used here to 
overcome these problems. When the data loop is in lock, the 
analog edge samples (S e ) at type! transitions are zero, resulting 
in zero sum current at the charge-pump! input (Vp=0). Thus, 
ideally no charge is pumped into the VCO control line when in 
lock and therefore there is no phase error due to control voltage 
ripple. When not in lock, the type! edge samples occur before 
(Early) or after (Late) the zero crossings, resulting in non-zero 
values for Se samples. Based on the previous and next symbol 
values (2 bit data from ADCs), the decision logic of each stage 
adds the analog Se values of type! with correct polarity to the 
control voltage of the loop and ignores the other two types of 
transitions by turning off the current of that stage (Fig. 5a). The 
chosen scheme for decision logic introduces a problem 
discussed in the next section. As the Se values are proportional 
to the loop phase error, the correction on the loop control 
voltage is also proportional to the phase error. The phase 
detector is therefore linear. Thus this PLL combines the 
advantages of both a linear and a sampling loop. The loop gain, 
and consequently the bandwidth, increases with the number of 
useful (type!) transitions per cycle. Assuming a random data 
sequence with a 4/5sym encoder, the average type! transition 
density is two per cycle, which results in a loop bandwidth of 
30MHz (BW/f ref > 0.06) and phase margin >48° with no 
systematic phase offset. 

As the phase detector has a limited frequency capture 
range, a frequency acquisition aid is employed to help acquire 
lock to a reference clock at start-up (Fig5a). The circuit, shown 
in Fig. 5b, uses cycle-slipping information when the Rx-VCO 
frequency is different from that of the incoming data. During 
cycle-slipping, sweeping of the clock phase causes the phase 
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Fig. 5 a) Phase/Freq. detector architecture b) Loop 
switching decision logic 
detector output (V P ) to oscillate between Early and Late 
signals. The frequency of this oscillation (sweep speed) is 
equal to the frequency difference between the receive clocks 
and the incoming data. If there is a considerable frequency 
difference, the oscillations at V P cause the edge-detector (Fig. 
5b) to produce pulses that continuously set the one-shot circuit 
output (V one ) to one. Once the VCO frequency is close enough 
to the incoming data frequency (within the data PLL capture 
range), the pulse rate of the edge-detector decreases such that 
C one can charge high enough to switch V one to zero. At the 
falling edge of V one , V Q , which is reset to zero at start-up, is 
asserted and hands the loop control to the data phase detector. 
The edge detector is designed to have hysteresis (Fig. 6), using 
positive feedback in its first stage amplifier. Thus, it reacts only 
to oscillation amplitudes larger than a certain threshold level, 
which helps prevent erroneous transitions due to noise, 

A 2 7 - 1 PRBS encoder and decoder, as well as a scannable 
transmit/receive data register, are provided on chip for BER 
testing. The 5/4sym decoder removes the extra line-code 
symbol (Fig. 2). 

Measurements 

The transmitter achieves a symbol rate of 5Gsym/s (10Gb/ 
s) with an eye opening of 200m V and 90ps, and 4Gsymys 
(8Gbps) with an eye opening of 350m V and 1 lOps over 10 
meters of coaxial cable, using pre-emphasis (Fig. 6). Symbols 
without pre-emphasis after the 10-m cable show an eye 
opening with 60-mV height and 50-ps width at 4Gsym/s. The 
transmitter output has an adjustable amplitude with a 
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maximum of 1.2V, and a jitter of 1 lps (p-p) and 2ps (rms). 

The receiver successfully detects an 8-Gbps 4-PAM d. 
stream after 10 meters with a 3-V supply. Receiver equali. 
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tion helps reduce the transmitter pre-emphasis for the 10-m 
cable, effectively allowing the use of longer cables for the 
link. At data rates higher than 8Gbps, the receive PLL fails 
due to increased high-frequency noise in the loop. Raising the 
supply to 3.3V allows the receiver to perform up to 9Gbps. 
The decision logic of the data phase detector inject undesired 
charge onto the VCO control line, causing error in sampling 
clock phases and data detection. This problem is corrected in 
the new revision of the chip. At 8Gbps over 10 meters, the 
time window for error-free detection is 50ps. This window is 
measured by connecting the receive and transmit PLLs to two 
different clock sources and varying the delay of one clock 
source until an error is detected. 

The receiver data-recovery PLL requires that the input 
symbols have a minimum peak-to-peak swing of 800mV 
differential (400mV swing on each line) to acquire lock, and 
600mV differential swing to maintain lock. This PLL has a 
capture range of >20MHz for a symbol stream with one tran- 
sition per cycle (5 symbols). The frequency acquisition circuit 
switches the loop control to the data phase detector when 
there is less than 100kHz frequency difference between the 
transmitter and receiver reference clocks. The receive PLL has 
a jitter of 28ps (p-p) and 4ps (rms), when locked to the 
incoming data signal. 

The chip occupies 2mm x 2mm of die area. The trans- 
ceiver die photo is shown in Fig. 7. 



Table 1 Performance Summary 



Transmitter performance 




Maximum transmitter rate 


8Gbps @ 3V, lOGb/s @3.3V 


Output jitter @ 8Gbps 


1 lps (p-p), 2ps (rms) 


Max. eye opening @8Gbps 


350m V, 11 Ops (10-m cable) 


Max. eye opening @10Gbps 


200mV, 90ps( 10-m cable) 


Receiver performance 




Maximum receive rate 


8Gbps @3V, 9Gbps @3.3V 


Data PLL jitter @ 8Gbps 


28ps (p-p), 4ps (rms) 


Data PLL capture range 


>20MHz 


Min. swing to capture lock 


800mV(p-p) differential 


Min. swing to maintain lock 


600mV(p-p) differential 


Data PLL dynamics 


BW >30MHz. Ph.m. >48° 


Power dissipation @ 8Gbps, 3 V 




Output driver 


220mW 


Analog (2 PLL) 


750mW 


Input samplers and logic 


130mW 


Total 


llOOmW 




Fig. 7 The transceiver die photograph 



Conclusions 

Using parallelism. 4-PAM modulation, and analog 
transmit and receive FIR filters, makes data rates of 8Gpbs 
achievable in conventional CMOS technology over long 
copper cables. Performance is further enhanced by a novel 
high-bandwidth linear data-recovery PLL with zero system- 
atic offset that reduces the bit error rate due to random phase 
errors. A new frequency detector design guarantees frequency 
acquisition of the data-recovery PLL under all process varia- 
tions. 
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